Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
AI in the Lab: How GPT-4 is Changing Molecules and Models image

AI in the Lab: How GPT-4 is Changing Molecules and Models

Breaking Math Podcast
Avatar
2.9k Plays17 days ago

In this episode of Breaking Math, we dive deep into the transformative power of large language models (LLMs) like GPT-4 in the fields of chemistry and materials science, based on the article "14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon" by Jablonka et al. from the Digital Discovery Journal. Discover how AI is revolutionizing scientific research with predictive modeling, lab automation, natural language interfaces, and data extraction from research papers. We explore how these models are streamlining workflows, accelerating discovery, and even reshaping education with personalized AI tutors.

Tune in to learn about real-world examples from a hackathon where scientists used LLMs to tackle some of the most pressing challenges in materials science and chemistry—and what this means for the future of scientific innovation.

Keywords: GPT-4, large language models, AI in chemistry, AI in materials science, predictive modeling, lab automation, AI in education, natural language processing, LLM hackathon, scientific research, molecular properties, Digital Discovery Journal, Jablonka

Become a patron of Breaking Math for as little as a buck a month

Follow Breaking Math on Twitter, Instagram, LinkedIn, Website, YouTube, TikTok

Follow Autumn on Twitter and Instagram

Follow Gabe on Twitter.

Become a guest here

email: [email protected]

Recommended
Transcript

Introduction to LLMs and Their Impact

00:00:00
Speaker
Welcome back to Breaking Math, a podcast where we dive deep into the intersection of mathematics, science, and the cutting edge technologies that are transforming our world. I'm your host, Gabriel Hesh, and I'm joined by my co-host, Autumn Fanath. Today, we are going to explore how large language models, or l LLMs, like chat and GPT-4, are breaking new ground in fields like chemistry and material science.
00:00:19
Speaker
Artificial intelligence has already made waves in so many areas, language, art, problem solving, but the applications for scientific discovery, especially in these technical fields, are only just starting to be

Virtual Hackathon: Exploring LLMs in Science

00:00:30
Speaker
realized. From accelerating research to predicting molecular properties and automating scientific workflows, AI is changing the way we do science. And in this episode, we're going to look at some of the most exciting ways LLMs are revolutionizing chemistry and material science.
00:00:43
Speaker
Today's story begins with an innovative virtual hackathon where scientists and AI experts came together to find out how LLMs could be applied in these fields. We would like to thank a team at Digital Discovery Journal for sending us the article, 14 examples of how LLMs can transform material science and chemistry, a reflection on a large language model hackathon by Jablanca at all. Now let's explore just how one and a half days of intense collaboration sparked some groundbreaking projects offering us a glimpse of the future.
00:01:20
Speaker
Imagine in a group of researchers and AI experts logging into a virtual space from all around the world. Their goal? To brainstorm, prototype, and present ways large language models like GPT-4 can be used in chemistry and material science.

Global Collaboration: Leveraging LLMs

00:01:34
Speaker
This wasn't a typical slow-paced academic exploration.
00:01:38
Speaker
It was a rapid high energy hackathon where participants just had one and a half days to deliver working prototypes. Scientists from 22 institutions in eight countries participated in the collaborative experiment. Each team focused on leveraging the power of LLMs in different areas.
00:01:57
Speaker
whether it was predicting molecular properties, extracting data from scientific literature, or creating educational tools. The results were nothing short of astounding. Teams created tools and prototypes and hours, projects that would have taken months to develop using traditional methods. These projects touched on nearly every aspect of chemistry and material science,
00:02:19
Speaker
demonstrating the incredible versatility of LLMs. In fact, the success of this hackathon marks the beginning of a new era where AI will play a central role in how science is

Versatility of LLMs in Chemistry and Material Science

00:02:32
Speaker
conducted. Now let's explore some of the key projects from the hackathon and what they reveal about the future of scientific research.
00:02:39
Speaker
Predictive modeling is at the heart of much of chemistry and material science. Whether it's predicting the stability of molecules or simulating reactions, these predictions rely heavily on data-driven models. Traditionally, these models are highly specialized and built and building them requires significant expertise and handcrafted algorithms.
00:02:57
Speaker
but large-language models like GPT-4 are making this process more accessible and flexible. One standout project from the hackathon focused on predicting molecular energy levels. The energy required to break a molecule into its individual atoms, known as atomization energy, is a critical property in quantum chemistry. Schemes used the LIFT framework, a technique that allows LLMs to interface with molecular data to predict these energy levels. Remarkably, the models performed well, achieving an R-squared value of 0.95, which is an impressive level of accuracy. Here's what's exciting. The LLMs were able to make these predictions using only simple molecular string representation like SMILES code. These representations don't contain the 3D structure of the molecules, which is typically crucial for accuracy.
00:03:44
Speaker
Yet the models came close to the precision of more complex models. This opens up possibilities for using LLMs in large-scale molecular predictions without needing laborious data preparation. Another project, Text to Concrete, focused on predicting the compressive strength of concrete formulations. Concrete is the world's most used construction material. and optimizing its formulation to reduce CO2 emissions is an active area of research. The team used ah LLMs to predict the strength of different concrete mixes and even incorporated fuzzy domain knowledge, like how water-to-cement ratios affect strength.
00:04:26
Speaker
While traditional machine learning models performed better overall, the LLM's ability to include context-specific knowledge made them a powerful tool for more nuanced predictions. Both of these projects demonstrate how LLMs can make predictive modeling more accessible, allowing scientists to generate hypotheses, conduct simulations, and even plan experiments faster than

Automating Workflows and Data Extraction with LLMs

00:04:50
Speaker
before.
00:04:50
Speaker
The power of LLMs goes beyond making predictions. They also have the potential to completely automate scientific workflows. One of the more fascinating outcomes of the hackathon was seeing how LLMs could be used as interfaces to complex tools and databases, allowing scientists to interact with these systems using natural language.
00:05:06
Speaker
Take the MAPI LLM project, for example. This team developed a system where researchers could ask questions like, is this material stable, or what is the band gap of a particular compound? The LLM would interpret the question, pull data from material science databases, and return an answer. and This kind of natural language interaction removes the technical barriers many researchers face when working with complex data sets. No longer do you need to know how to code or write complex queries to get the data that you need. Another project, Smalltalk, took this idea further by creating natural language interfaces for visualization tools. Imagine a researcher who needs to generate custom visualizations of protein structures. and Instead of learning to code in JavaScript or a specific visualization tool,
00:05:51
Speaker
they can simply ask the LLM to highlight this region of the protein in red or rotate this molecule 90 degrees. The LLM takes care of generating the code and executing the commands. Automation isn't just about making life easier for researchers. It's about efficiency. By cutting out the need for glue code, the kind of code that's often written just to make different tools talk to each other. LLMs allow scientists to spend more time on what really matters, discovery and analysis.
00:06:25
Speaker
The scientific world is drowning in data. Every day, new research papers are published, adding to a growing mountain of information that's difficult to set through. LLMs offer a powerful solution to this problem by automating the extraction of valuable data from scientific literature. One hackathon project, Table 2 JSON, focused on converting data from tables in research papers into structured formats that could be used in databases and further analyses. Research papers often include critical information about material properties, experimental conditions, and outcomes in tabular form.
00:06:55
Speaker
This project used GPT-4 to read these tables and convert them into structured JSON data, which can be fed directly into computational workflows. The results were promising. LLM successfully extracted and formatted data from different tables in a variety of papers.
00:07:10
Speaker
However, challenges arose when the tables included more complex content such as chemical notation or special characters. These issues were largely solved with careful prompting and fine tuning, showing that LLMs can significantly reduce the time it takes to extract and structure scientific data. This ability to convert unstructured data into structured formats is a game changer.
00:07:35
Speaker
It means that large bodies of research, once locked away in PDF files and difficult to use formats, can now be tapped into by AI models to accelerate discovery.
00:07:47
Speaker
It's not just researchers who stand to benefit from the advancements in LLM technology. Education is another area where AI is poised to make a significant impact. Imagine having a personalized digital tutor that can guide students through difficult concepts, answer their questions, and even create quizzes tailored to their learning pace. The iDigest project from the Hackathon explored this idea by using LLMs to assist students with lecture material.
00:08:08
Speaker
The team used the WISPR model to transcribe lecture videos into text, which was then processed by an LLM to generate questions based on the content. This allowed students to engage with the material more actively, either by testing their understanding before watching the video or by getting targeted questions afterward to reinforce learning. What's exciting here is the potential for creating an infinite variety of questions, explanations, and examples. AI-powered tutors could offer students personalized interactive learning experiences,
00:08:36
Speaker
that are far beyond what traditional textbooks can provide. This technology could revolutionize how we teach complex subjects like chemistry, offering a more dynamic and engaging way to learn.

Challenges and Future of LLMs in Science

00:08:47
Speaker
While the hackathon projects showcase the immense potential of LLMs in scientific research and education, there are still challenges to overcome. For one, many of the models used in the hackathon, such as GPT-4, are proprietary, meaning researchers don't have full control over how these models are built or trained.
00:09:02
Speaker
There's also the issue of reliability. LMs can sometimes produce errors or generate outputs that don't fully adhere to the requested structure, as we saw with the Table to JSON project. There's also the question of accessibility. While tools like OpenAI's API make it easy to use these models, open source alternatives are more difficult to implement, and they don't perform as well without significant fine tuning.
00:09:25
Speaker
Another key challenge is the need for new benchmarks to evaluate the performance of LLMs in scientific contexts. Traditional benchmarks which focus on tabular data are not enough when we're dealing with unstructured data or complex workflows. Researchers need more sophisticated tools to measure how well LLMs are performing in specific scientific tasks.

LLMs as Creative Partners in Innovation

00:09:47
Speaker
particularly when they rely on context or external tools. Despite these challenges, the future looks incredibly bright. The hackathon was just the beginning. We're seeing the dawn of a new era where AI will be an indispensable partner in scientific research, helping us accelerate discoveries and tackle some of the most complex problems in chemistry, material science and beyond.
00:10:10
Speaker
Before we wrap up, I want to highlight something truly fascinating that emerged from the hackathon. The potential for LLMs to act not just as tools, but as creative partners in scientific innovation. You've probably heard of web reduct debugging.
00:10:22
Speaker
Where explaining your problem to an inanimate object helps clarify your thinking. Now imagine explaining your problems to an LLM which can probe your thinking with follow-up questions, suggest alternative approaches, or even generate new hypotheses for you to explore. The idea opens up new ways of using AI, not just for solving problems, but for generating creative breakthroughs. LLMs could become collaborators in research, inferring fresh perspectives, or uncovering connections that researchers might not see on their own.
00:10:46
Speaker
This is something we're just starting to explore, but the possibilities are endless. As LLM become more sophisticated and capable of handling more complex reasoning, we could see a future where AI isn't just helping us with the technical details, it's helping us to drive innovation. As we've seen today, large language models like GPT-4 are much more than just powerful text generators. They're tools that have the potential to revolutionize science, education, and creativity.
00:11:12
Speaker
From predictive modeling and automating workflows to extracting knowledge from scientific literature and transforming how we teach, LLMs are going to be the foundational technologies in the years to come. The hackathon showed us what's possible, but we're only scratching the surface. and As we continue to refine these models and overcome the challenges ahead, we're likely to see AI becoming an integral part of the research process, not just chemistry and material science, but across all scientific fields. That's all for today's episode of Breaking Map. We hope you enjoyed this deep dive into the world of AI and its potential to transform science. Don't forget to subscribe to the podcast, follow us on social media, and let us know what topics you'd like us to explore next.
00:11:55
Speaker
Thank you for joining us, and remember, math is everywhere, even the AI systems that are shaping the future of science. Until next time, keep exploring, keep questioning, and keep breaking math.